bradley-terry model
- Asia > Japan (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Hawaii (0.04)
- (5 more...)
- Government (1.00)
- Education (0.93)
- Law (0.69)
Model inference for ranking from pairwise comparisons
Catalina, Daniel Sánchez, Cantwell, George T.
We consider the problem of ranking objects from noisy pairwise comparisons, for example, ranking tennis players from the outcomes of matches. We follow a standard approach to this problem and assume that each object has an unobserved strength and that the outcome of each comparison depends probabilistically on the strengths of the comparands. However, we do not assume to know a priori how skills affect outcomes. Instead, we present an efficient algorithm for simultaneously inferring both the unobserved strengths and the function that maps strengths to probabilities. Despite this problem being under-constrained, we present experimental evidence that the conclusions of our Bayesian approach are robust to different model specifications. We include several case studies to exemplify the method on real-world data sets.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Maximizing the efficiency of human feedback in AI alignment: a comparative analysis
Chouliaras, Andreas, Chatzopoulos, Dimitris
Reinforcement Learning from Human Feedback (RLHF) relies on preference modeling to align machine learning systems with human values, yet the popular approach of random pair sampling with Bradley-Terry modeling is statistically limited and inefficient under constrained annotation budgets. In this work, we explore alternative sampling and evaluation strategies for preference inference in RLHF, drawing inspiration from areas such as game theory, statistics, and social choice theory. Our best-performing method, Swiss InfoGain, employs a Swiss tournament system with a proxy mutual-information-gain pairing rule, which significantly outperforms all other methods in constrained annotation budgets while also being more sample-efficient. Even in high-resource settings, we can identify superior alternatives to the Bradley-Terry baseline. Our experiments demonstrate that adaptive, resource-aware strategies reduce redundancy, enhance robustness, and yield statistically significant improvements in preference learning, highlighting the importance of balancing alignment quality with human workload in RLHF pipelines.
- North America > United States > Michigan (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
Learning from Group Comparisons: Exploiting Higher Order Interactions
Yao Li, Minhao Cheng, Kevin Fujii, Fushing Hsieh, Cho-Jui Hsieh
We study the problem of learning from group comparisons, with applications in predicting outcomes of sports and online games. Most of the previous works in this area focus on learning individual effects--they assume each player has an underlying score, and the "ability" of the team is modeled by the sum of team
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Yolo County > Davis (0.05)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Efficient Portfolio Selection through Preference Aggregation with Quicksort and the Bradley--Terry Model
Ge, Yurun, Böttcher, Lucas, Chou, Tom, D'Orsogna, Maria R.
How to allocate limited resources to projects that will yield the greatest long-term benefits is a problem that often arises in decision-making under uncertainty. For example, organizations may need to evaluate and select innovation projects with risky returns. Similarly, when allocating resources to research projects, funding agencies are tasked with identifying the most promising proposals based on idiosyncratic criteria. Finally, in participatory budgeting, a local community may need to select a subset of public projects to fund. Regardless of context, agents must estimate the uncertain values of a potentially large number of projects. Developing parsimonious methods to compare these projects, and aggregating agent evaluations so that the overall benefit is maximized, are critical in assembling the best project portfolio. Unlike in standard sorting algorithms, evaluating projects on the basis of uncertain long-term benefits introduces additional complexities. We propose comparison rules based on Quicksort and the Bradley--Terry model, which connects rankings to pairwise "win" probabilities. In our model, each agent determines win probabilities of a pair of projects based on his or her specific evaluation of the projects' long-term benefit. The win probabilities are then appropriately aggregated and used to rank projects. Several of the methods we propose perform better than the two most effective aggregation methods currently available. Additionally, our methods can be combined with sampling techniques to significantly reduce the number of pairwise comparisons. We also discuss how the Bradley--Terry portfolio selection approach can be implemented in practice.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (11 more...)
- Government (0.68)
- Leisure & Entertainment > Games (0.46)
Score-Based Density Estimation from Pairwise Comparisons
Mikkola, Petrus, Acerbi, Luigi, Klami, Arto
We study density estimation from pairwise comparisons, motivated by expert knowledge elicitation and learning from human feedback. We relate the unobserved target density to a tempered winner density (marginal density of preferred choices), learning the winner's score via score-matching. This allows estimating the target by `de-tempering' the estimated winner density's score. We prove that the score vectors of the belief and the winner density are collinear, linked by a position-dependent tempering field. We give analytical formulas for this field and propose an estimator for it under the Bradley-Terry model. Using a diffusion model trained on tempered samples generated via score-scaled annealed Langevin dynamics, we can learn complex multivariate belief densities of simulated experts, from only hundreds to thousands of pairwise comparisons.
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Europe > Belgium > Flanders (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Efficient Bayesian Inference from Noisy Pairwise Comparisons
Aczel, Till, Theis, Lucas, Roger, Wattenhofer
Evaluating generative models is challenging because standard metrics often fail to reflect human preferences. Human evaluations are more reliable but costly and noisy, as participants vary in expertise, attention, and diligence. Pairwise comparisons improve consistency, yet aggregating them into overall quality scores requires careful modeling. Bradley-Terry-based methods update item scores from comparisons, but existing approaches either ignore rater variability or lack convergence guarantees, limiting robustness and interpretability. We introduce BBQ, a Bayesian Bradley-Terry variant that explicitly models rater quality, downweighting or removing unreliable participants, and provides guaranteed monotonic likelihood convergence through an Expectation-Maximization algorithm. Empirical results show that BBQ achieves faster convergence, well-calibrated uncertainty estimates, and more robust, interpretable rankings compared to baseline Bradley-Terry models, even with noisy or crowdsourced raters. This framework enables more reliable and cost-effective human evaluation of generative models.
- Europe > Switzerland (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > India (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- (2 more...)
- Asia > Japan (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Hawaii (0.04)
- (5 more...)
- Government (1.00)
- Education (0.93)
- Law (0.69)
Preference-Based Dynamic Ranking Structure Recognition
Lu, Nan, Shi, Jian, Tian, Xin-Yu
Preference-based data often appear complex and noisy but may conceal underlying homogeneous structures. This paper introduces a novel framework of ranking structure recognition for preference-based data. We first develop an approach to identify dynamic ranking groups by incorporating temporal penalties into a spectral estimation for the celebrated Bradley-Terry model. To detect structural changes, we introduce an innovative objective function and present a practicable algorithm based on dynamic programming. Theoretically, we establish the consistency of ranking group recognition by exploiting properties of a random `design matrix' induced by a reversible Markov chain. We also tailor a group inverse technique to quantify the uncertainty in item ability estimates. Additionally, we prove the consistency of structure change recognition, ensuring the robustness of the proposed framework. Experiments on both synthetic and real-world datasets demonstrate the practical utility and interpretability of our approach.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)